-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HtsgetBAMFileReader #1494
Add HtsgetBAMFileReader #1494
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1494 +/- ##
===============================================
+ Coverage 69.279% 69.325% +0.046%
- Complexity 8764 8888 +124
===============================================
Files 590 601 +11
Lines 34755 35433 +678
Branches 5800 5901 +101
===============================================
+ Hits 24078 24564 +486
- Misses 8386 8538 +152
- Partials 2291 2331 +40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersleung I promised to get this done today but other stuff came up and then it got late. Here's my first round of comments but I'll probably have more tomorrow. It looks generally good so far but I think we want to change some of the handling of the Input resources.
* @return false, since htsget sources never have indices | ||
*/ | ||
@Override | ||
public boolean hasIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be problematic since this method is used in 2 different ways. The first and most important is to mean "can I query on this reader or do I have to iterate over it". Obviously htsget supports queries without the index. The second way this is used though is "let me check if I can locate the index for this file so I can perform some operation on it. That's very rarely used but it is used internally by the sam reader to guard a check to getIndex.
We might want to add a new method "isQueryable" or something like that to the interface so we can split that concept up.
returning true here will be a lie but will work better with almost everything. we'd have to make getIndex return null then I think.
Either way it's unfortunate.
@@ -523,6 +548,11 @@ void applyTo(final CRAMFileReader underlyingReader, final SamReader reader) { | |||
void applyTo(final SRAFileReader underlyingReader, final SamReader reader) { | |||
underlyingReader.enableIndexCaching(true); | |||
} | |||
|
|||
@Override | |||
void applyTo(final HtsgetBAMFileReader underlyingReader, final SamReader reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design is so crufty. It seems like someone started refactoring it to use ReaderImplementation so we don't need these custom handlers here as well but then never actually did it. (Just me venting... nothing to do here...)
|
||
/** | ||
* Class allowing deserialization from json htsget error response | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: stopping here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersleung More comments. I think I got everything this time.
src/main/java/htsjdk/samtools/util/htsget/HtsgetErrorResponse.java
Outdated
Show resolved
Hide resolved
import java.io.IOException; | ||
import java.net.URI; | ||
|
||
public class HtsgetBAMFileReaderTest extends HtsjdkTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to run these tests with a variety of settings on the htsget reader. Particularly I'd like to see them with async on for the htsget reader since it seems plausible that there could be potential issues there. You could add a datprovider that has multiple readers.
The readers also need to be closed at the end.
Typically I'm in favor of not reusing the same objects for multiple tests and generating them fresh each time, it helps avoid test ordering problems or if 1 test puts the objects into a broken state. This should be OK though.
public class HtsgetResponseUnitTest extends HtsjdkTest { | ||
@Test | ||
public void testDeserialization() { | ||
final String respJson = "{\"htsget\":{\"format\":\"BAM\",\"urls\":[{\"url\":\"data:application/vnd.ga4gh.bam;base64,QkFNAQ==\",\"class\":\"header\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/header\",\"class\":\"header\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/run1.bam\",\"headers\":{\"Authorization\":\"Bearer xxxx\",\"Range\":\"bytes=65536-1003750\"},\"class\":\"body\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/run1.bam\",\"headers\":{\"Authorization\":\"Bearer xxxx\",\"Range\":\"bytes=2744831-9375732\"},\"class\":\"body\"}],\"md5\":\"blah\"}}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't wait until java adds multiline strings...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersleung I think this is looking good overall, but I think the prefetcher could use some changes. I think there are a number of race conditions and generally the error handling needs to be improved.
@@ -0,0 +1,5 @@ | |||
This folder contains scripts and files necessary for starting a local htsget reference server so that htsget functionality can be tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is helpful
src/main/java/htsjdk/samtools/util/htsget/SAMRecordPrefetchingIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/htsjdk/samtools/util/htsget/SAMRecordPrefetchingIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/htsjdk/samtools/util/htsget/SAMRecordPrefetchingIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/htsjdk/samtools/util/htsget/SAMRecordPrefetchingIterator.java
Outdated
Show resolved
Hide resolved
58797b4
to
042f50b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersleung A few minor comments. Looks good to me after those are resolved.
src/main/java/htsjdk/samtools/util/SAMRecordPrefetchingIterator.java
Outdated
Show resolved
Hide resolved
* Note that this implementation is not synchronized. If multiple threads | ||
* access an instance concurrently, it must be synchronized externally. | ||
*/ | ||
public class SAMRecordPrefetchingIterator implements CloseableIterator<SAMRecord> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I have two small comments. I don't see any concurrency issues so hopefully that means that there aren't any :)
This is nicer than the last implementation.
if (this.backgroundThread == null) return; | ||
/* | ||
If prefetch thread is interrupted while awake and before acquiring permits, it will either acquire the permits | ||
and pass through to the next case, or check interruption status before sleeping then exit immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
src/test/java/htsjdk/samtools/util/SAMRecordPrefetchingIteratorTest.java
Outdated
Show resolved
Hide resolved
|
||
import java.util.stream.IntStream; | ||
|
||
public class SAMRecordPrefetchingIteratorTest extends HtsjdkTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing these tests!
@@ -65,6 +67,7 @@ private void prefetch() { | |||
// InterruptedException is expected if the iterator is being closed | |||
return; | |||
} catch (final Throwable t) { | |||
t.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should only be printed in the case of an Error, not in the normal case of an Exception.
co-authored-by: Florian Reisinger <[email protected]>
Description
Add a new SamReader type that is able to request reads from HTSget sources, as well as classes defining the content of an HTSget request and response. This change allows consumers of the htsjdk library to use HTSget sources like any other type of reads source.
Things to think about before submitting: